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Abstract 

We consider tomographic reconstruction using priors in the form of 
a dictionary learned from training images. The reconstruction has two 
stages: first we construct a tensor dictionary prior from our training data, 
and then we pose the reconstruction problem in terms of recovering the 
expansion coefficients in that dictionary. Our approach differs from past 
approaches in that a) we use a third-order tensor representation for our 
images and b) we recast the reconstruction problem using the tensor for¬ 
mulation. The dictionary learning problem is presented as a non-negative 
tensor factorization problem with sparsity constraints. The reconstruction 
problem is formulated in a convex optimization framework by looking for a 
solution with a sparse representation in the tensor dictionary. Numerical 
results show that our tensor formulation leads to very sparse represen¬ 
tations of both the training images and the reconstructions due to the 
ability of representing repeated features compactly in the dictionary. 


1 Introduction 

Tomographic image reconstruction from noisy incomplete projection data at 
few views or in a limited-angle setting is a common challenge in computed 
tomography, yet classical methods such as filtered back-projection (FBP) are 
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well known to fail (see, e.g., [4] and [17]). To compute a robust solution it 
is necessary to incorporate adequate prior information into the mathematical 
reconstruction formulation. Total Variation (TV) regularization can be suited 
for edge-preserving imaging problems in low dose and/or few-view data sets [32]; 
but in the presence of noise TV tends to oversmooth textures in natural images 

[40]. 

“Training images” in the form of a carefully chosen set of images can be used 
to represent samples of the prior — such as texture — for the desired solution. 
Clearly, such images must be reliable and application-specific. A suitable way 
to incorporate the prior information from training images and to extract the 
prototype features of such images is to form a dictionary from the images such 
that they are reproducible from a limited (aka sparse) linear combination of 
those images. The dictionary is then used for representation of other images 
with similar features. Such “sparse coding” of natural images was introduced 
by Olshausen and Field [37]. 

Recent development in the theory and computational tools for sparse repre¬ 
sentation of signals and images (see, e.g., [15] and [7]) has enabled us to analyze 
massive training data. Dictionary learning methods are now widely used to 
compute basis elements and learn sparse representations of training signals and 
images (see, e.g., [1] and [35]). Likewise, sparse representation in terms of such 
dictionaries has attracted increased interest in solving imaging problems such 
as denoising [14], deblurring [34], and restoration [36], in addition to solving 
tomographic image reconstruction problems [39, 43]. 

One common feature in the aforementioned papers on dictionary learning 
and sparse representation in terms of these dictionaries is the reliance on the 
(invertible) mapping of 2D images to vectors and subsequent use of a linear 
algebraic framework: Matrices are used for the dictionary representation (the 
columns represent vectorized forms of image feature) and the use of a linear 
combination of the columns of the dictionary gives the expression of the image, 
in its vectorized form. However, the training data itself can be more naturally 
represented as a multidimensional array, called a tensor. For example, a collec¬ 
tion of K gray-scale images of size M x N could be arranged in an M x K x N 
array, also known as a third-order tensor. Recent work in imaging applications 
such as facial recognition [22] and video completion [44] has shown that using 
the right kind of factorizations of particular tensor-based representations of the 
data can have a distinct advantage over matrix-based counterparts. For this 
reason, in this paper we will develop a fundamentally new approach for both 
the dictionary learning and image reconstruction tasks that is based on a par¬ 
ticular type of tensor decomposition based around the t-product introduced in 

[29]. 

There are several different tensor factorizations and decompositions such as 
CANDECOMP/PARAFAC (CP) [28] and Tucker decomposition [41]. The use 
of different decompositions is driven by applications as well as the properties 
of the decompositions. For an extensive list of tensor decompositions, their 
applications, and further references, see [31]. Some recent works provide algo¬ 
rithms and analysis for tensor sparse coding and dictionary learning based on 
different factorization strategies. Caiafa and Cichocki [9] discuss multidimen¬ 
sional compressed sensing algorithms using the Tucker decomposition. Zubair 
and Wang [45] propose a tensor learning algorithm based on the Tucker model 
with a sparsity constraint on the core tensor. Tensor-based extensions of the 
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method of optimal directions (MOD) [16] and the KSVD algorithm [1] have 
been studied in [38] for separable multidimensional dictionaries. An algorithm 
for tensor dictionary learning based on the CP decomposition, called K-CPD, 
is presented in [13]. 

Recent work by Kilmer et al. [30] sets up a new theoretical framework which 
facilitates a straightforward extension of matrix factorizations to third-order 
tensors based on a new tensor multiplication definition, called the t-product. 
The motivation for our work is to use the t-product as a natural extension for 
the dictionary learning problem and image reconstruction in a third-order tensor 
formulation with the factorization based on the framework in [29] and [30]. 

The goal of this paper is to re-visit the dictionary learning approach in [39] 
for X-ray CT reconstruction, now using a tensor formulation of the problem. As 
we will show, the new formulation is not a trivial reformulation of the matrix- 
based method, and we demonstrate that in the tensor formulation we obtain a 
much sparser representation of both the dictionary and the reconstruction. 

This paper is organized as follows. We first establish basic notation and 
definitions in Section 2. In Section 3 we briefly discuss the dictionary learning 
problem and the reconstruction problem using dictionaries. In Section 4 we 
describe the tensor dictionary learning problem and the corresponding algorithm 
based on the alternating direction method of multipliers. Then, in section 5 we 
utilize the learned tensor dictionary to formulate the reconstruction problem 
as a regularized inverse problem. Numerical results are presented in Section 6 
demonstrating the performance and parameter choice of our algorithm. 

2 Notations and Preliminaries on Tensors 

In this section we present the definitions and notations that will be used through¬ 
out this paper. We exclusively consider the tensor definitions and the tensor 
product notation introduced in [29] and [30]. Throughout the paper, a capital 
italics letter such as A denotes a matrix and a capital calligraphy letter such as 
A denotes a tensor. 

A tensor is a multidimensional array of numbers. The order of a tensor 
refers to its dimensionality. Thus, if A G M lxmxn then we say A is a third-order 
tensor. A 1 x 1 x n tensor is called a tube fiber. A graphical illustration of a 
third-order tensor decomposed into its tube fibers is given in the upper right 
image of Fig. 1. Thus, one way to view a third-order tensor is as a matrix of 
tube fibers. In particular, an £ x 1 x n tensor is a vector of tube fibers. To 
make this clear, we use the notation Aj = A(:,j, :) to denote the j th “column” 
or lateral slice of the third-order tensor (see the middle figure of the bottom 
row of Fig. 1). The fcth frontal slice , which is an £ x m matrix, is denoted by 
A= A(:,:,k). Frontal slices and other decompositions of a third-order tensor 
are all shown in Fig. 1. 

We can consider an / x 1 x n tensor is a matrix oriented into the third 
dimension. It will therefore be useful to use notation from [29] that allows us to 
easily move between l x n matrices and their l x 1 x n counterparts. Specifically, 
the squeeze operation on A G M Zxlxn is identical to the squeeze function in 
MATLAB: 

X = squeeze(A) => X(i, k) = X (i, 1, k). 
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Figure 1: Different representations of a third-order tensor, from [31]. Top left 
to right: column, row, and tube fibers. Bottom left to right: horizontal, lateral, 
and frontal slices. 


The vec function unwraps the tensor A into a vector of length imn by 
column stacking of frontal slices, i.e., in MATLAB notation: vec(*4) = A(:). 
For the tensor A we define the unfold and fold functions in terms of frontal 
slices: 


unf old(^4) 


/A (1) \ 

A& 


e R. 


Inxm 


\A( n ) / 


fold (unfold (^4)) = A. 


The block circulant matrix of size In x nm that is generated via unfold(^4) is 
given as 


circ(*4) = 


/AW 


1) . . 

•• ^ 2 )\ 

AW 

AW 

A( n ) ■ ■ 

. . AO) 

\A {n) 


^4(^-2) . . 

AW.) 


Definition 1 Let B G M Zxpxn andC G W >xmxn , Then the t-product from [29] 
is defined by 

A = B * C = f old(cir c{B) unf old(C)), 

from which it follows that A is an £ x m x n tensor. 


The t-product can be considered as a natural extension of the matrix multipli¬ 
cation [6]. In general the t-product is not commutative between two arbitrary 
tensors, but it is commutative between tube fibers. 


Definition 2 Given m tube fibers c^ G M lxlxn , j = 1,..., m a t-linear com¬ 
bination [30] of the lateral slices Aj G M^ xlxr \ j = 1,..., m, is defined as 

A\ * Cl + A 2 * C2 + • • • + Am * c ra = A * C , 


where 




Cl 


jmxlxn 
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The multiplication c j * Aj is not defined unless £ = 1. 

Definition 3 The identity tensor X mmn is the tensor whose first frontal slice 
is the m x m identity matrix, and whose other frontal slices are all zeros. 

Definition 4 An m x m x n tensor A has an inverse B, provided that 

A * B — Tmmn and B * A — T rrirnn . 

Definition 5 Following [29], if A is l x m x n, then the transposed tensor 
A T is the mxl xn tensor obtained by transposing each of the frontal slices and 
then reversing the order of transposed frontal slices 2 through n. 

Definition 6 Let aijk be the i,j,k element of A. Then the Frobenius norm 
of the tensor A is 


P|| F =||vecM)|| 2 


We also use the following notation: 


l rn n 


\ EEE a m- 

\i= 1 j= 1 k =1 


Mllsum = ||vec(^4)111 = W \a ijk \, ||.4|| max = ||vec(^)||oo = m&x\a ijk \. 

' i.i.k 

hj,k 


If A is a matrix then ||A|| sum = YU j \ a ij \• Let (Ji, i = 1,..., min {m,n} denote 
the singular values of A. The nuclear norm (also known as the trace norm) is 
defined as 

min{m,n} 

||t 4 ||* = trace (Vt 4 T t 4 ) = ^ Oi , 

i= 1 

where A T is the the transpose of A. 


3 The Dictionary Learning Problem 

In classical dictionary learning, the problem is to find “basis elements” and 
sparse representations of the training signals/images. That is, we want to write 
the input vectors as a weighted linear combination of a small number of (un¬ 
known) basis vectors. 

In practice, to find localized features of a training image (and to reduce the 
computational work), training patches of smaller size are extracted. Let the 
patches be of size p x r and let the matrix Y = [yi, 7 / 2 , • • •, Vt\ £ W rxt consist 
of t training patches arranged as vectors/columns of length pr. The resulting 
dictionary learning problem, based on this matrix formulation, is then defined 
as follows: For the given data matrix Y and a given number of elements s , find 
two matrices D G W rxs and H G W xt , whose product approximates Y as well 
as possible: Y « DH. The columns of the dictionary matrix D (often with 
s > pr) represents a “basis” of s image patches, and the matrix H contains 
the representation of each training image patch approximated by the dictionary 
elements. It is often assumed that, as Y is non-negative, D and H should be 
constrained to be non-negative as well. Even with a non-negativity constraint, 
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the decomposition is not unique [12]. Other constraints, such as sparseness, 
statistical independence, and low complexity are often exploited in forming the 
basis and the representation. Different prior considerations lead to different 
learning methods such as non-negative matrix factorization (NMF) [33], MOD 
[16], K-SVD [1], the online dictionary learning method [35] and many more (see, 
e.g., [15] and [24]). 

A framework for tomographic image reconstruction using matrix dictionary 
priors was developed and described in [39]. In tomography, a signal b is mea¬ 
sured from rays or signals passing through an object of interest. The discretized 
tomographic model is represented by an m x n matrix A. Considering an un¬ 
known vector x as vector of pixels/voxels for the reconstructed image; this yields 
a linear system Ax ~ b with an ill-conditioned matrix A. Generally the image 
x is not sparse but the situation changes when we know that x has a sparse 
representation in terms of a known basis. 

Using a vectorized formulation in [39], a global matrix dictionary W is 
formed from the learned patch dictionary D, and the linear tomographic prob¬ 
lem is solved such that the vectorized image x is a conic combination of a small 
number of dictionary elements, i.e., x = Wa and a is sparse, meaning that the 
solution lies in the cone spanned by dictionary images but that many of the 
weights are zero. The simplicity of this approach is that once the basis elements 
have been determined, the solution is linear in these new variables. 

Images are naturally two-dimensional objects and we find that it is funda¬ 
mentally sound to work with them in their natural form. For example we are 
looking for correlations image-to-image (not just pixel-to-pixel) that let us re¬ 
duce the overall redundancy in the data. Therefore, we will consider a collection 
of training patches as a third-order tensor, with each 2D image making up a 
slice of the data tensor. It is natural to use higher-order tensor decomposition 
approaches, which are nowadays frequently used in image analysis and signal 
processing [2, 9, 10, 22, 30]. We describe this approach in more detail in the 
next section. 

4 Tensor Dictionary Learning 

In recent years there has also been an increasing interest in obtaining a non¬ 
negative tensor factorization (NTF) (often based on CP and Tucker decomposi¬ 
tions) as a natural generalization of the NMF for a nonnegative data. Similar to 
NMF, the sparsity of the representation has been empirically observed in NTF 
based on CP and Tucker decompositions. For NTF based on a subset of tensor 
decomposition methods, we refer to [10]. Unlike the work in [10], we express the 
dictionary learning problem in a third-order tensor framework based on the t- 
product. This will be described in detail below, but the key is a t-product-based 
NTF reminiscent of the NMF. 

The NTF based on the t-product was proposed in [23], where preliminary 
work with MRI data showed the possibility that sparsity is encouraged when 
non-negativity is enforced. Here, we extend the work by incorporating sparsity 
constraints and we provide the corresponding optimization algorithm. Given 
the patch tensor dictionary X>, we compute reconstructed images that have a 
sparse representation in the space defined by the t-product and V. Thus, both 
the dictionary and the sparsity of the representation serve to regularize the 
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ill-posed problem. 


4.1 Tensor Factorization via t-Product 

Let the third-order data tensor y E M^ xtxr consist of t training image patches 
of size p x r, arranged as the lateral slices of y , i.e., 

$ = ■), for j = 

Our non-negative tensor decomposition problem, based on the t-product, is the 
problem of writing the non-negative data tensor as a product y = V * H of two 
tensors V E M pxsxr and % E R sxtxr . The tensor V consists of s dictionary 2D 
image patches of size p x r arranged as the lateral slices of £>, while H is the 
tensor of coefficients. The main difference between NTF and NMF is that the 
s x t x r tensor T-L has r times more degrees of freedom in the representation 
than the s x t matrix H. 

The t-product from Definition 1 involves unfolding and forming a block 
circulant matrix of the given tensors. Using the fact that a block circulant 
matrix can be block-diagonalized by the Discrete Fourier Transform (DFT) [20, 
§4.7.7], the t-product is computable in the Fourier domain [30]. Specifically, we 
can compute y = V * T~L by applying the DFT along tube fibers of V and T-L: 

k = 1. r, 

where ^ denotes DFT; in MATLAB notation we apply the DFT across the third 
dimension: y = f ft (A’, [ ], 3). Working in the Fourier domain conveniently 
reduces the number of arithmetic operations [22], and since the operation is 
separable in the third dimension it allows for parallelism. 

Although the representation of the training patches in the third-order tensor 
resembles the matrix formulation, it is not a re-formulation of the matrix prob¬ 
lem packaged as tensors. In fact, the tensor formulation gives a richer approach 
of formulating the problem, as we now describe. 

Recall that the j th patch Yj is the j th lateral slice of y = V * H, i.e., 
Yj = squeeze (y(:,j ,:)). Hence, as shown in [23], 


A = E squeeze (P(:, i,:)) circ ^squeeze (TL(j^ i, :) T )^. (1) 

In other words, the j th patch is a sum over all the lateral slices of X>, each one 
“weighted” by multiplication with a circulant matrix derived from a tube fiber 

of n. 

We use a small example to show why this is significant. Consider the 3x3 
downshift matrix and the (column) circulant matrix generated by the vector v : 


Z 


Noting that 


0 

A 

(Vl 

V3 

v 2 \ 

0 

° , 

C[v\ = circ('u) = V 2 

Vl 

V3 

1 

0/ 

\V3 

V2 
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3 /0 

C[v\ = ^ VkZ 1 *- 1 m v x I + v 2 1 
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it follows that 


3 

DC[v\ = ^VkDZb- 1 . 

k= 1 

Extrapolating to (1), we obtain the following result. 

Theorem 1 Let Z denote the nxn downshift matrix. With Di = squeeze (V(: 

, i,:)) and h^ = squeez e(fH(j,i, :) T ), ^ e jth image patch is given by 

Yj = D iC[h (y) ] = E (h^Di + ± hp>D t Z k -A . (2) 

i= 1 i= 1 \ k =2 / 

To show the relevance of this result we note that the product DiZ k ~ x is 
Di with its columns cyclically shifted left by k — 1 columns. Assuming that 
Di represents a “prototype” element/feature in the image, we now have a way 
of also including shifts of that prototype in our dictionary without explicitly 
storing those shifted bases in the dictionary. Note that if h = 0, k = 2,..., n 
then Yj is a (standard) linear combination of matrices Dp, this shows that our 
new approach effectively subsumes the matrix-based approach from [39], while 
making the basis richer with the storage of only a few entries of a circulant 
matrix rather than storing extra basis image patches! 

4.2 Formulation of the Tensor-Based Dictionary Learning 
Problem 

One is usually not interested in a perfect factorization of the data because over¬ 
fitting can occur, meaning that the learned parameters do fit well the training 
data, but have a bad generalization performance. This issue is solved by making 
a priori assumptions on the dictionary and coefficients. 

Based on the approximate decomposition y ~ V*l-L, we consider the generic 
tensor-based dictionary learning problem (similar to the matrix formulation 
from [39]): 

min C dic (y, V * %) + $ dic (P) + $ rep (H). (3) 

X' ,7x 

The misfit of the factorization approximation is measured by the loss function 
Cd i C5 (e.g., the Frobenius norm). Different priors on the dictionary V and the 
representation tensor Li are controlled by the regularization functions <hdic (V) 
and ^rep(^). 

NTF itself results in a sparse representation. Imposing sparsity-inducing 
norm constraints on the representation allows us to further enforce sparsity of 
the representation of the training image, i.e., the training patches being repre¬ 
sented as a combination of a small number of dictionary elements. At the same 
time this alleviates the non-uniqueness drawback of the NTF. 

Therefore, we pose the tensor dictionary learning problem as a non-negative 
sparse coding problem [25]: 

min i/2||V - V * U\\l + Ap|| sum + I D (V) + J R .xtx-(«). (4) 

Here D is a closed set defined below, Iz denotes the indicator function of a set 
Z, and A > 0 is a regularization parameter that controls the sparsity-inducing 
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penalty ||W|| sum . If we do not impose bound constraints on the dictionary 
elements, then the dictionary and coefficient tensors V and T-L can be arbitrarily 
scaled, because for any £ > 0 we have \\y — (££>) * (V^)IIf = IIA 7 — £> * 
H\\y- We define the compact and convex set D such that V G D prevents 
this inconvenience: 

D = {D e Ry sxr | i, :) || F < v 'pr, i = 1,.. . ,s}. (5) 

When r = 1 then (4) collapses to the standard non-negative sparse coding 
problem. 

4.3 The Tensor-Based Dictionary Learning Algorithm 

The optimization problem (4) is non-convex, while it is convex with respect 
to each variable V or T-L when the other is fixed. Computing a local minimizer 
can be done using the Alternating Direction Method of Multipliers (ADMM) [5] , 
which is a splitting method from the augmented Lagrangian family. We therefore 
consider an equivalent form of (4): 

minimize^^y 1 /2 \\y — U * V||| + A \\H\\ suin + I R sxtxr(l~L) + Id(P) ^ 
subject to V = U and T-L = V, 

where V,U G W )Xsxr and H, V G R sxtxr . The augmented Lagrangian for (6) is 

l p (t>,u,h,v,a,A) = y 2 \\y~u*v\\ 2 F + \\\n\\ sum + i K x tX r(n) + i D (v) 

+ A T ® {V - U) + A T ® {U -V) 

+ p(i/2||z>- will + w- VIII), 

(7) 

where A G W >xsxr and A G M sxtxr are Lagrange multiplier tensors, p > 0 
is the quadratic penalty parameter, and © denotes the Hadamard (entrywise) 
product. 

The objective function becomes separable by introducing the auxiliary vari¬ 
ables U and V. The alternate direction method is obtained by minimizing L p 
with respect to X>, H, U , V one at a time while fixing the other variables at 
their most recent values and updating the Lagrangian multipliers A and A. If 
Pd is the metric projection on D (which is computed using Dykstra’s alternating 


projection algorithm [11]), then the ADMM updates are given by: 

T> k +i = mmL p (D,Hfc,^,Vfc,4,4) = Po{Mk ~ p~ 1 Ak) (8a) 

14+1 = mmL p (Vk, T-LkMk, V, A k , Ak) (8b) 

= (ul * u k + piy 1 *(ul*y + A k + P u k ) 

U k +i= min LpiPk+tiUMkiVk+XiAi^Ak) (8c) 

HeR s + xtxr 

= P+(Sa/ p (V/c + i - p~ 1 A k )) 

U k + i = minLp(P fe +i,^fe,W, 14+i, A k , A k ) (8d) 

= (y * Vk +i + A k + pVk+i) * (Vfc+i * V k +i + pT) 

Ak+i = Ak + p(V k +£ — Uk+ 1 ) (8e) 

Afe+i = + pi'Hk^i ~ V fc+ i). (8f) 
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Here P + (0)^j = max{^ ?J -,0} and S\/ p denotes soft thresholding. The updates 
for Uk+i and Vk+i are computed in the Fourier domain. 

The KKT-conditions for (7) can be expressed as 

v = u, u = v, 

a = -(y -v*h)*h t , A = -v t 

-A e <9<J> dic (:D), -A e d$ rep (H), 

where df(X) denotes the sub-differential of / at X. The KKT conditions are 
used to formulate stopping criteria for the ADMM algorithm, and we use the 
following conditions: 

||P -Climax <c 
max(l, ||X>|| 

max) 

||i-x> r *(£>*?f-y)|| max <c 

max(l, ||yl||max) 

where e > 0 is a given tolerance. Algorithm 1 summarizes the algorithm to solve 
(4). Note that satisfaction of the KKT conditions produces a local minimum; 
this is not a guarantee of convergence to the global optimum. 


Il«-V||r 


< e, 


max(l, \\H\\ max) 

\\A-(v*n-y)*H T \\ 


max(l, 


0 


< e, 


(9a) 

(9b) 


Algorithm 1 Tensor Dictionary Learning Algorithm 

Input: Tensor of training image patches y G M^. xtxr , number of dictionary 
images s, tolerances p, e > 0. 

Output: Tensor dictionary Vk G M^ xsxr , tensor representation Hk G 

xt xr 

Initialization: Let the lateral slices of U be randomly selected training 
patches, let V be the identity tensor, let T-L = V, and let A , A be zero tensors 
of appropriate sizes. 

for k = 1,... do 

Update V k ,U k Mk, Vk,A k ,A n by means of (8). 
if all stopping criteria (9) are met then 
Exit. 

end if 
end for 


Under rather mild conditions the ADMM method can be shown to converge 
for all values of the algorithm parameter p in the Lagrange function L p (7), 
cf. [5]. Small values of p lead to slow convergence; larger values give faster 
convergence but puts less emphasis on minimizing the residual for the NTF. 
For the convergene properties of ADMM and the impact of the parameter p see 
[18] and the references therein. 

5 Tomographic Reconstruction with Tensor Dic¬ 
tionary 

Recall that a linear tomographic problem is often written Ax ~ b with A G 
M mxn , where the vector x represents the unknown M x N image, the vector b is 
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the inaccurate/noisy data, and the matrix A represents the forward tomography 
model. Since we assume that the vector x represents an image of absorption 
coefficients we impose a nonnegativity constraint on the solution. 

Without loss of generality we assume that the size of the image is a multiple 
of the patch sizes in the dictionary. We partition the image into q = ( M/p)(N/r ) 
non-overlapping patches of size (M/p) x ( N/r ), i.e., Xj E M pxr for j = 1,. .., q. 

In the matrix-based formulation of the reconstruction problem [39], once 
the patch dictionary is formed we write the image patches we want to recover 
(subvectors of the reconstructed image x) as conic combinations of the patch 
dictionary columns. The inverse problem then becomes one of recovering the 
expansion coefficients subject to non-negativity constraints (which produces a 
nonnegative x because the dictionary elements are nonnegative). 

Here we define a similar reconstruction problem in our tensor-based formula¬ 
tion. We arrange all the patches Xj of the reconstructed image as lateral slices 
of a p x q x r tensor T, i.e., 

Xj = squeeze(Aj), A?j = X(:j,:), j = l,...,q. 

Moreover, we assume that there exists a s x q x r coefficient tensor C such 
that the image patches can be written as t-linear combinations of the patch 
dictionary elements, i.e., 

X = T> * C Xj = V* Cj , j = 1 ,..., q. (10) 

where the tube fibers of Cj = C(:, j,:) can be considered as the expansion coeffi¬ 
cients. In other words, we restrict our solution so that it is a t-linear combination 
of the dictionary images. 

Then, similar to (1), each patch Xj in the reconstruction can be built from 
the matrices squeeze(D*), i = 1.. *, s: 

Xj = squeeze (V * Cj) = squeeze (2^) circ ^squeeze (Cj(i, 1, :) T )^. (11) 

2=1 

Since the circulant matrices are not scalar multiples of the identity matrix, Xj 
is not a simple linear combination of the matrices squeeze (Vi). 

Thus, we want to find a tensor C such that X = V*C solves the reconstruction 
problem, and to ensure a nonnegative reconstruction, we enforce non-negativity 
constraints on C. Then we write the vectorized image as x = nvec(D*C), where 
the permutation matrix n ensures the correct shuffling of the pixels from the 
patches. Then our generic reconstruction problem takes the form 

nun C rec (AUvec(T> * C), b) + ^ sp (C) + $ im (X> * C), C > 0. (12) 

The data fidelity is measured by the loss function £ rec , and regularization is 
imposed via ^> sp which enforces a sparsity prior on C, and <f>i m which enforces 
an image prior on the reconstruction. By choosing these three functions to be 
convex, we can solve (12) by means of convex optimization methods. 

Our patches are non-overlapping because overlapping patches tend to pro¬ 
duce blurring in the overlap regions of the reconstruction. Non-overlapping 
patches may give rise to block artifacts in the reconstruction, because the ob¬ 
jective in the reconstruction problem does not penalize jumps across the values 
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at the boundary of neighboring patches. To mitigate this type of jumps, we add 
an image penalty term <Fi m (2) * C) = 5 2 'ip(Hvec(V * C)) that discourages such 
artifacts, where S is a regularization parameter, and the function ^ is defined 

by 

^ = M(M/p - 1) + N{N/r - 1) 2 ^ Lz ^ 2 ' 

The matrix L is a defined such that Lz is a vector with finite-difference approx¬ 
imations of the vertical/horizontal derivatives across the patch boundaries. The 
denominator is the total number of pixels along the boundaries of the patches 
in the image. 

We consider two different ways to impose a sparsity prior on C in the form 
<h sp (C) = M^(C)> v ~ 1? 2, where fi is a regularization parameter and 

MQ = VdICIIsum, MC) = V^IIOUm + ||C||*), (14) 


in which the sq x r matrix C is defined as 


C = 


( squeeze (Ci)^ 


^squeeze (Cq)) 


The first prior corresponds to a standard sparsity prior in reconstruction 
problems. The second prior <^ 2 , which tends to produce a sparse and low-rank 
C, is inspired by a similar use in compressed sensing [19]. 

To summarize, we consider a reconstruction problem of the form 


minimizec ^ ||^nvec(D * C) — b ||| + /jupv (C) + 5 2 i/j(llvec(V * C)) 
subject to C > 0, 


(15) 


where ji and 5 are regularization parameters. We note that (15) is a convex but 
non-differentiable optimization problem. It is solved using the software package 
TFOCS (Templates for First-Order Conic Solvers) [3]. The implementation 
details are included in the Appendix. 

We note that imposing the non-negativity constraint on the solution implies 
that each image patch Xj belongs to a closed set defined by 

G = {V * I G M; xlxr } c M^ xlxr . (16) 

The set G is a cone, since for any G G and any nonnegative tube fiber 
c G ]R lxlxr the product * c belongs to G. Clearly, if the dictionary V 
contains the standard basis that spans M^ x 1 x r then G is equivalent to the entire 
nonnegative orthant M^ xlxr , and any image patch Xj can be reconstructed by 
a t-linear combination of dictionary basis images. However, in the typical case 
where G is a proper subset of M^ xlxr then not all nonnegative images have an 
exact representation in G, leading to an approximation error. 


6 Numerical Experiments 

We conclude with computational tests to examine the tensor formulation. All 
experiments are run in MATLAB (R2014a) on a 64-bit Linux system. The 
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reconstruction problems are solved using the software package TFOCS version 

1.3.1 [3] and compared with results from the matrix-based approach [39]. We 
use a 1600 x 1200 high-resolution photo of peppers; from this image we extract 
the p x r training image patches, as well as the 200 x 200 ground-truth or exact 
image x exact , see Fig. 2. The exact image is not contained in the training set, so 
that we avoid committing an inverse crime. All the images are gray-level and 
scaled in the interval [0,1]. 

6.1 Dictionary Learning Experiments 

Patch sizes should be sufficiently large to capture the desired structure in the 
training images, but the computational cost of the dictionary learning increases 
with the patch size. A study of the patch size p x r and number s of elements 
in [39] shows that a reasonably large patch size gives a good trade-off between 
the computational work and the approximation error by the dictionary, and 
that the over-representation factor s/(pr) can be smaller for larger patches. For 
these reasons, we have chosen p = r = 10 and (unless otherwise noted) s = 300 
for both the dictionary learning and tomographic reconstruction studies. We 
extract 52,934 patches from the high-resolution image and apply Algorithm 1 
to learn the dictionary. The tensor dictionary V and the coefficient tensor H 
are 10 x 300 x 10 and 300 x 52934 x 10, respectively. 

Convergence plots for A = 0.1, 1, and 10 are shown in Fig. 3. For A = 10 we 
put emphasis on minimizing the sparsity penalty, and after about 200 iterations 
we have reached convergence where the residual term dominates the objective 
function. For A = 0.1 we put more emphasis on minimizing the residual term, 
and we need about 500 iterations to converge; now the objective function is 
dominated by the sparsity penalty. 

Next we consider the approximation errors mentioned in the previous section. 
Following [39], a way to bound these errors is to consider how well we can 
approximate the exact image x exact with patches in the cone G (16) defined by 
the dictionary. Consider the q approximation problems for all blocks XJ xact , 
j = 1,2of the exact image: 

min^ V 2 ||^ *Cj~ y exact ||p s.t. (% > 0. 
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A, =0.1 


A=1 


A, =10 





Figure 3: Convergence of Algorithm 1 for A = 0.1, 1, and 10. We plot 1 / 2 \\y — 
V*T-L\\p + A||W|| sum versus the number of iterations. Note the different scalings 
of the axes. 



Figure 4: The mean approximation error MAE (6.1) for the tensor and ma¬ 
trix formulations versus the number of nonzeros of H and H , respectively, as 
functions of A (small A give a larger number of nonzeros). 
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Figure 5: A trade-off curve for the tensor dictionary learning problem; the red 
dot denotes the value A = 3.1623 that minimizes ||H||g Um + ||^V — D * %|||. 

If Cj * denotes the solution to the jth problem, then vec(2A * Cj*) is the best 
approximation in G of the j th block XJ xact . We define the mean approximation 
error as 



The MAE is defined analogously for the matrix formulation in [39]. Figure 
4 shows how these MAEs vary with the number of nonzeros of T-L and H, as 
a function of A, for both s = 200 and s = 300. This plot shows that for a 
given number of nonzeros in T-L or H we obtain approximately the same mean 
approximation error. In other words despite the fact that the s xtxr tensor H 
has r times more degrees of freedom in the representation than the sxt matrix 
H, we do not need more nonzero values to represent our training images. Hence 
the number of nonzeros is not a decisive factor. 

In Fig. 4 we note that for large enough A both H and H consist entirely 
of zeros, in which case the dictionaries V and D are solely determined by the 
constraints. Hence, as A increases the MAE settles at a value that is almost 
independent on A. 

To determine a suitable value of the regularization parameter A in (4) we 
plot the residual norm ||^ — V * H\\y versus 1111sum f° r various A G [0.1, 100] 
in Fig. 5. We define the optimal parameter to be the one that minimizes 
ll^llsum + \\y ~ which is obtained for A = 3.1623, and we use this value 

throughout the rest of our experiments. 

Figure 6 shows examples of tensor and matrix dictionary elements/images, 
where lateral slices of the tensor dictionary and columns of the matrix dictio¬ 
nary are represented as images. The dictionary images are sorted according to 
increasing variance. The tensor and matrix dictionary images are different but 
they are visually similar. 

We conclude these experiments with a study of how the number s of dictio¬ 
nary elements influences the dictionary, for the fixed A = 3.1623. Specifically, 
Fig. 7 shows how the density and the MAE varies with s in the range from 100 
to 500. As we have already seen for s = 300 the density of H is consistently 
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Figure 6: Examples of dictionary elements/images from the tensor formulatio 
(left) and the matrix formulation (right) with 10 x 10 patches and A = 3.1623 
and s = 300. 



Figure 7: Dependence of the dictionary on the number of dictionary elements s, 
for both the tensor and matrix formulations. Left: the density of H and H. 
Right: the MAE associated with the dictionaries. 
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much lower than that of H , and it is also less dependent on s in the tensor 
formulation. We also see that the MAE for the tensor formulation is consis¬ 
tently lower for the tensor formulation: even with s = 400 dictionary elements 
in the matrix formulation we cannot achieve the tensor formulation’s low MAE 
for s = 100. These results confirm our intuition that the tensor formulation is 
better suited for sparsely representing the training image, because due to the 
ability of capturing repeating fectures we can use a much smaller dictionary. 

6.2 Reconstruction Experiments 

In this section we present numerical experiments for 2D tomographic reconstruc¬ 
tion in few-projection and noisy settings. We perform two different experiments 
to analyze our algorithm: first we examine the role of different regularization 
terms and then we study the reconstruction quality in different tomography 
scenarios. We also present results using a more realistic test problem. 

Recall from Section 5 that our reconstruction model has the form Ax ~ b 
with A E M mxn . We consider parallel-beam geometry and the test problem is 
generated by means of the function paralleltomo from AIR TOOLS [21]. The 
exact data is generated by the forward model 6 exact = Ax exact , to which we add 
white Gaussian noise. 

The accuracy of the reconstruction is measured by the relative 2-norm error 
RE = ||x exact -x|| 2 /||x exact || 2 . 

We also report the structural similarity index measure (SSIM) [42] (recall that 
a larger SSIM means a better reconstruction). We remind that the error is due 
to the combination of the approximation error, the error from the data, and the 
regularization error. 

The parameters S and fi in the reconstruction problem (15) both play a role 
in terms of regularization; to simplify (15) we set r = fi/q. As described in 
[39] , a nonnegative constraint in the reconstruction problem plays an extra role 
of regularization and therefore the reconstruction is not very sensitive to the 
regularization parameters S and r, hence they are chosen from a few numerical 
experiments such that a solution with the smallest error is obtained. 

We compare our method with filtered back projection (FBP), Tikhonov regu¬ 
larization, and total variation (TV). The FBP solution is computed using MAT- 
LAB’s iradon function with the “Shepp-Logan” filter. The Tikhonov solution 
solves the problem 

min \\Ax - b\\\ + Axikhlklli 

X 

where Axikh is the Tikhonov regularization parameter. The TV regularization 
problem has the form 

n 

min !/2 || Ax - b\\\ + A T v ^||D- d x|| 2 , 0 < x < 1, 

i=1 

where Dj d computes a finite-difference approximation of the gradient at each 
pixel, and Axv is the TV regularization parameter. We solve this problem with 
the software T VReg [26] . The Tikhonov and TV regularization parameters are 
chosen to yield the smallest reconstruction error. 
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Table 1: Comparison of the best solutions computed by different reconstruction 
methods. The bold numbers indicate the lowest iteration number, density of C 
and compression percentages, and highest SSIM measure. 


Method 


Itr.# Density% Compr.% RE% SSIM 


FBP: filtered back projection 
Tikhonov regularization: 

TV: total variation 
Matrix formulation [39] 36843 

Tensor alg., ||C|| sum reg. 48787 

Tensor alg., |jcjj sum + ||C||* reg. 8002 


- 

- 

54.81 

0.2981 

- 

- 

21.99 

0.5010 

- 

- 

21.37 

0.4953 

12.53 

5.31 

22.00 

0.4903 

5.30 

0.67 

22.21 

0.4890 

10.27 

3.26 

21.55 

0.5061 


The computational bottleneck of the objective function evaluation in solving 
(15) is calculating V * C, where V G M pxsxr and Q ^ M s xgxr_ Recall that the 
computation is done in the Fourier domain, and since log(r) < q,p the com¬ 
putational complexity of the t-product is 0(sqpr + s(p + q)r log(r)) = O(sqpr) 
[22]. In the matrix formulation [39] the computational bottleneck is the ma¬ 
trix multiplication D reshape (<a, s, q) where D G W >rxs and a G M sgrXl , also 
with complexity O(sqpr). This gives the tensor formulation an advantage, since 
we can use a much smaller s here, say, 2-3 times smaller than in the matrix 
formulation. 

Since computation times vary between different computers, and since we 
did not pay specific attention to efficiency, we report the number of objective 
function evaluations returned by TFOCS. We stop the iterations when the 
relative change in the iteration vector is less than 10 -7 . For the comparison to 
be fair, the starting point in all the computations is the zero vector/matrix of 
appropriate size. 

6.2.1 Study of Regularization Terms 

We solve the reconstruction problem using the exact image shown in Fig. 2. 
Moreover, we use 10 x 10 patches, s = 300, and A = 3.1623. For the problems 
in this section we use 7V P = 25 projections, N r = 283 rays per projection, and 
1% noise. We compare two different regularization terms in the reconstruction 
problem (15). The 1-norm (sparsity) regularization ||C|| sum is similar to the 1- 
norm regularization in the dictionary learning problem (4). The regularization 
term ||C|| sum + ||C||^ results in coefficient tensors that are simultaneously low 
rank and sparse. 

We compare the tensor reconstruction solution with the solutions obtained 
by the matrix formulation [39] as well as FBP, Tikhonov regularization, and 
TV. The reconstructions are shown in Fig. 8. The corresponding relative errors, 
SSIM, and densities of C as well as the number of objective function evaluation 
are listed in Table 1. The table also lists the compressibility, defined as the 
percentage of coefficients which have values larger than 10 -4 . Both the density 
and the compressibility show that we obtain very sparse representations of the 
reconstructed image. 

The FBP, Tikhonov, and TV methods fail to produce desirable reconstruc¬ 
tions, although the 2-norm reconstruction error for the TV solution is slightly 
smaller than that for our solutions. The RE and SSIM do not tell the full story, 
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W 3 - 16 \v = 183 



(a) FBP (b) Tikhonov (c) TV 

x=0.021 5, 5=13.34 x=0.0215, 5=10 x=0.0215, 5=10 



(d) Matrix formulation (e) Tensor: ||C|| sum (f) Tensor: ||C|| sum + ||C||* 


Figure 8: Comparison of the best solutions computed by different reconstruction 
methods. Subfigures (e) and (f) correspond to our new tensor formulation with 
two different regularization terms; we used A = 3.1623. 
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and using a dictionary clearly improves recovering the texture of the image. The 
reconstructed images in Fig. 8 are similar across the matrix and tensor formula¬ 
tions; however, the results in Table 1 show that the tensor-formulation solution 
is more than 5 times more compressed and more than 2 times sparser than the 
matrix-formulation solution. Imposing both sparsity and low-rank regulariza¬ 
tion || ' 11sum + ||*||* produces a marginally more accurate solution with a denser 
representation. 

6.2.2 More Challenging Tomographic Reconstructions 

To further study the performance and robustness of our tensor formulation 
approach, we consider problems with more noise in the data, or with projection 
angles in a limited range, still using the same test problem. Knowing that 
FBP, Tikhonov, and TV give unsatisfactory solutions for such problems, we 
only compare our method with the matrix formulation approach, and again we 
consider both regularization terms ||C|| sum and ||C|| sum + ||(7||* in (15). 

• First we compute a reconstruction with N p = 50 projections, uniform 
angular sampling in [0°,180°] and with relative noise level 1%. In this 
scenario we use more projection data than in the previous section. 

• Next we use 50 and 25 projections uniformly distributed in the limited 
range [0°, 120°] and with relative noise level 1%. 

• Finally we use 25 and 50 projections with uniform angular sampling in 
[0°, 180°] and with relative noise level 5%, i.e., a higher noise level than 
above. 

The reconstructions are shown in Fig. 9; they are similar across the tensor and 
matrix formulations, and pronounced artifacts have appeared from the limited 
angles and the higher noise level. 

Table 2 lists the corresponding relative error, SSIM, density, and compress¬ 
ibility together with the iteration number. Comparison of Tables 1 and 2 reveal 
the same pattern. Algorithm 1 converges faster when imposing the combined 
regularization term ||C|| sum + ||C||*, and this choice also slightly improves the re¬ 
construction in all scenarios. However, enforcing only the sparsity prior ||C|| sum 
significantly reduce the representation redundancy, leading to a very sparse 
representation comparing to the matrix formulation. In the scenario with 50 
projections and 1% noise, where the regularization and perturbation errors are 
less dominating, the improvement in reconstructions by the tensor algorithm — 
compared to the matrix formulation — is more pronounced. Overall, we recom¬ 
mend the use of ||C|| sum + \\C\\^ which leads to the faster algorithm. 

6.2.3 A Larger Test Problem 

Tomography is a common tool in materials science to study the structure of 
grains in polycrystalline materials such as metals. The grain structure is some¬ 
times known a priori in the form of training images. As test image in this 
experiment we use a high-resolution image of zirconium grains (produced by a 
scanning electron microscope) of dimension 760 x 1020 shown in Fig. 10. 

Training patches of size 10 x 10 are again extracted from the high-resolution 
image to learn matrix and tensor dictionaries size 100 x 300 and 10 x 300 x 
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(a) N p = 50 
angles in [0°, 180°] 
noise 1% 


x=0.01,8=10 x=0.0147, 8=10 x=0.01,8=10 



(b) N p = 50 
angles in [0°, 120°] 
noise 1% 


x=0.0032, 8=10 x=0.0022, 8=10 x=0.01,8=10 



(c) Np = 25 
angles in [0°, 120°] 
noise 1% 


x=0.01,8=13.34 x=0.01,8=10 x=0.01,8=10 



(d) Np = 50 
angles in [0°, 180°] 
5% 


x=0.0215, 8=1000 x=0.1,8=100 x=0.0464, 8=13.34 



x=0.1468, 8=237.14 x=0.2154, 8=100 


x=0.1,8=31.62 


(e) Np = 25 

angles in [0°, 180°] noise 
5% 



Matrix alg. 


Tensor alg. Tensor alg. 

||C||sum reg. ||C||sum + HCIU reg. 


Figure 9: Reconstruction experiments from Section 6.2.2 with A = 3.1623. 
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Table 2: Comparison of tensor and matrix formulation reconstructions in the ex¬ 
periments from Section 6.2.2. The methods “Matrix,” “Tensor-1,” and “Tensor- 
2” refer to the matrix-formulation algorithm and our new tensor-formulation 
algorithm with regularization terms ||C|| sum and ||C|| sum + ||C||*. The bold num¬ 
bers indicate the lowest iteration number, density and compression, and the 
highest SSIM. 


Settings 

Method 

Itr.# 

Density% 

Compr.% 

RE% 

SSIM 

TVp = 50 

Matrix 

41204 

20.70 

8.80 

17.70 

0.6368 

angles in [0°, 180°] 

Tensor-1 

52801 

4.46 

0.79 

17.19 

0.6560 

noise 1% 

Tensor-2 

15676 

17.39 

1.84 

16.82 

0.6688 

N p = 50 

Matrix 

48873 

14.4575 

9.43 

22.77 

0.5695 

angles in [0°, 120°] 

Tensor-1 

61106 

9.08 

0.98 

22.80 

0.5818 

noise 1% 

Tensor-2 

16177 

23.81 

2.07 

22.49 

0.5883 

N p = 25 

Matrix 

45775 

100 

5.91 

25.46 

0.4536 

angles in [0°, 120°] 

Tensor-1 

59347 

26.00 

0.73 

25.85 

0.4544 

noise 1% 

Tensor-2 

17053 

27.49 

2.29 

25.33 

0.4676 

N p = 50 

Matrix 

110322 

50.17 

8.02 

22.05 

0.4910 

angles in [0°, 180°] 

Tensor-1 

40695 

8.97 

0.74 

21.84 

0.4846 

noise 5% 

Tensor-2 

10392 

14.64 

1.72 

21.81 

0.5107 

N p = 25 

Matrix 

72139 

45.51 

6.29 

24.69 

0.3768 

angles in 0°, 180°] 

Tensor-1 

37072 

8.60 

0.64 

25.12 

0.3738 

noise 5% 

Tensor-2 

9076 

13.28 

2.4829 

24.67 

0.4041 



Figure 10: Left: the high-resolution image of zirconium grains used to generate 
the training patches. Right: the exact image of size 520 x 520. 
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x=0.01,5=316.23, ?i=3.1623 



Figure 11: Reconstructions for the realistic test problem, computed with the 
matrix formulation (a) and the tensor formulation (b) + (c). 


10, respectively. To avoid committing inverse crime, we first rotate the high- 
resolution image and then extract the exact image of dimensions 520 x 520 — also 
shown in Fig. 10. 

We use a parallel-beam setting with N p = 50 projection angles in [0°, 180°] 
and N r = 707 rays per projection, and again the matrix is computed by means 
of the function paralleltomo from AIR TOOLS [21]. We added 1% white 
Gaussian noise to the clean data. This problem is highly underdetermined with 
m = 36750 measurements and n = 270400 unknowns. 


Table 3: Comparison of reconstruction in the realistic test problem, using the 
matrix and tensor formulations. The bold numbers indicate the lowest iteration 
number, density, and compression, and highest SSIM. 


Method 

Itr.# 

Density% 

Compr.% 

RE% 

SSIM 

Matrix formulation [39] 

73961 

48.61 

6.86 

14.90 

0.4887 

Tensor alg., ||C|| sum reg. 

74310 

33.18 

0.76 

15.23 

0.4793 

Tensor alg., ||C|| sum + ||C||* reg. 

24396 

38.78 

3.17 

14.80 

0.5035 


Figure 11 shows the reconstructed images with the matrix and tensor for¬ 
mulations. All regularization parameters are chosen empirically to give the 
smallest reconstruction errors. All three reconstructions are similar, since the 
reconstruction errors are dominated by the error coming from the regulariza¬ 
tion of the noisy data. More data are given in Table 3. Imposing the sparsity 
prior ||C11 S um in the tensor formulation produces the sparsest representation. 
The solution is computed in fewer iterations with the ||C|| sum + ||(7||* regulariza¬ 
tion term while the reconstruction has a negligible improvement in terms of RE 
and SSIM. We conclude that our tensor algorithm is also well suited for more 
realistic tomographic problems. 

7 Conclusion 

We presented the problem of dictionary learning in a tensor formulation and 
focused on solving the tomographic image reconstruction in the context of a 
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t-product tensor-tensor factorization. We posed the tensor dictionary learning 
problem as a non-negative sparse tensor factorization problem, and we computed 
a regularized nonnegative reconstruction in the tensor-space defined by the t- 
product. We also gave an algorithm based on the alternating direction method 
of multipliers (ADMM) for solving the tensor dictionary learning problem and, 
using the tensor dictionary, we formulated a convex optimization problem to 
recover the solution’s coefficients in the expansion under the t-product. 

We presented numerical experiments on the properties of the representation 
in the learned tensor dictionary in the context of tomographic reconstruction. 
The dictionary-based reconstruction quality is superior to well know classical 
regularization schemes, e.g., filtered back projection and total variation, and the 
solution representation in terms of the tensor dictionary is more sparse compared 
to a similar matrix dictionary representations [39]. In future work the authors 
would like to further study the tensor dictionary representation property using 
other products from the family of tensor-tensor products introduced, e.g., in 


[27]. 
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Appendix A: Reconstruction Solution Via TFOCS 


The reconstruction problem (15) is a convex, but ||C|| sum and ||C||* are not dif¬ 
ferentiable which rules out conventional smooth optimization techniques. The 
TFOCS software [3] provides a general framework for solving convex optimiza¬ 
tion problems, and the core of the method computes the solution to a standard 
problem of the form 


minimize l(A(x) — b) + h(x), 


(17) 


where the functions l and h are convex, A is a linear operator, and b is a vector; 
moreover l is smooth and h is non-smooth. 

To solve problem (15) by TFOCS, it is reformulated as a constrained linear 
least squares problem: 





where c = ^2(M(M/p — 1) + N(N/r — 1)). Referring to (17), /(•) is the squared 
2 -norm residual and h(-) = [up v (-). 

The methods used in TFOCS require computation of the proximity operators 
of the non-smooth function h. The proximity operator of a convex function is a 
natural extension of the notion of a projection operator onto a convex set [11]. 
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Let / = ||C|| sum = yC'llsum and g = ||C||* be defined on the set of real¬ 
valued matrices and note that dom/ fl domg ^ 0. For Z G M mxn consider the 
minimization problem 

minimize^ f(X) + g(X) + 1 / 2 \\X - Z ||| (19) 


whose unique solution is X = proxj + ^(Z). While the prox operators for ||C|| sum 
and He'd* are easily computed, the prox operator of the sum of two functions is 
intractable. Although the TFOCS library includes implementations of a variety 
of prox operators — including norms and indicator functions of many common 
convex sets — implementation of prox operators of the form proxj + ^(-) is left 
out. Hence we compute the prox operator for || • || sum + || • ||* iteratively using a 
Dykstra-like proximal algorithm [11], where prox operators of || * || sum and || • ||* 
are consecutively computed in an iterative scheme. 

Let r = jjb/q > 0. For f(X ) = r||X|| sum and X > 0, prox^ is the one-sided 
elementwise shrinkage operator 


VTOX f (X)ij 


o, Xij > T 

< Xij - r, \X id \ < T 
,0, Xij < -T 


The proximity operator of g(X) = r||X||* has an analytical expression via the 
singular value shrinkage (soft threshold) operator 

prox^pf) = £7diag(cq - r) V T , 

where X = UYi V T is the singular value decomposition of X [8] . The computa¬ 
tion of t||( 71|* can be done very efficiently since C is sq x r with r <C sq. 

The iterative algorithm which computes an approximate solution to proxj +5/ 
is given in Algorithm 2. Every sequence Xk generated by Algorithm 2 converges 
to the unique solution prox^ + ^ of problem (19) [11]. 


Algorithm 2 Dykstra-Like Proximal Algorithm 
Input: The matrix Z 
Output: proxj + ^(Z) 

Initialization: Set X\ — Z and set Pi and Q\ to zero matrices of appro¬ 
priate sizes, 
for k = 1, 2,... do 

Y k = pro x g (X k +P k ) 

P k +i = X k + P k — Y k 
X k+ i = pro x f (Y k + Q k ) 

Qk+i = Y k + Qk ~ X k +i 

if \\Y k - X fe+1 || F < 10 -3 then 
Exit 

end if 
end for 
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